Gazetteer Preparation for Named Entity Recognition in Indian Languages
نویسندگان
چکیده
This paper describes our approaches for the preparation of gazetteers for named entity recognition (NER) in Indian languages. We have described two methodologies for the preparation of gazetteers1. Since the relevant gazetteer lists are more easily available in English we have used a transliteration based approach to convert available English name lists to Indian languages. The second approach is a context pattern induction based domain specific gazetteer preparation. This approach uses a domain specific raw corpus and a few seed entities to learn context patterns and then the corresponding name lists are generated by using bootstrapping.
منابع مشابه
CRF-based Named Entity Recognition @ICON 2013
This paper describes performance of CRF based systems for Named Entity Recognition (NER) in Indian language as a part of ICON 2013 shared task. In this task we have considered a set of language independent features for all the languages. Only for English a language specific feature, i.e. capitalization, has been added. Next the use of gazetteer is explored for Bengali, Hindi and English. The ga...
متن کاملA Survey of Named Entity Recognition in Assamese and other Indian Languages
Named Entity Recognition is always important when dealing with major Natural Language Processing tasks such as information extraction, question-answering, machine translation, document summarization etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular reference to Assamese. There are various rule-based and machine learning approaches available for N...
متن کاملNamed Entity Recognition in Hindi using Maximum Entropy and Transliteration
(NER) system becomes challenging if proper resources are not available. Gazetteer lists are often used for the development of NER systems. In many resource-poor languages gazetteer lists of proper size are not available, but sometimes relevant lists are available in English. Proper transliteration makes the English lists useful in the NER tasks for such languages. In this paper, we have describ...
متن کاملMaximum Entropy Approach based Named Entity Recognition in Punjabi Language
Named Entity Recognition is the task of identifying and classifying named entities into some predefine categories like person, location, organization etc. NER is used in many applications like text summarization, text classification, question answering and machine translation systems etc. For English a lot of work has already been done in the field of NER, where capitalization is a major key fo...
متن کاملLearning-Based Named Entity Recognition for Morphologically-Rich, Resource-Scarce Languages
Named entity recognition for morphologically rich, case-insensitive languages, including the majority of semitic languages, Iranian languages, and Indian languages, is inherently more difficult than its English counterpart. Worse still, progress on machine learning approaches to named entity recognition for many of these languages is currently hampered by the scarcity of annotated data and the ...
متن کامل